SQL Integration Plan - Complete SQL Migration

Last Updated: November 13, 2025
Status: Phase 2 (Dual-Write) - Ready for full SQL migration
Objective: Complete transition from file-based to SQL-based storage for all outputs, models, and configurations


Executive Summary

The ACM system is fully equipped for SQL-based operation. All infrastructure is in place:

Current State: Database schema complete, code supports dual-write, ZERO data in tables (fresh start)

Next Action: Enable dual-write mode and populate SQL database alongside file outputs for validation


Database Schema Status (Verified November 13, 2025)

Core Tables (21 tables ready)

BASE TABLES:
 Equipment                    -- Asset master data (0 rows - ready for population)
 Runs                         -- Pipeline execution tracking (0 rows)
 ModelRegistry                -- Trained model storage (0 rows)
 ConfigLog                    -- Configuration change audit trail (0 rows)
 Historian                    -- Raw time-series cache (0 rows)

TIME-SERIES OUTPUTS:
 ScoresTS                     -- Detector scores (fused_z, ar1_z, pca_spe_z, etc.)
 DriftTS                      -- Multi-feature drift signals
 PCA_ScoresTS                 -- PCA T² and SPE scores
 ForecastResidualsTS          -- AR1 residual tracking
 DataQualityTS                -- Data quality metrics over time

ANALYTICS TABLES:
 AnomalyEvents                -- Episode detection results
 RegimeEpisodes               -- Operating regime periods
 AnomalyTopSpikes             -- Top contributing sensors per episode
 XCorrTopPairs                -- Sensor correlation rankings
 FeatureImportance            -- Drift culprit analysis
 DriftSummary                 -- Drift change point summary
 CPD_Points                   -- Change point detection results
 RunStats                     -- Run-level quality metrics

MODEL PERSISTENCE:
 PCA_Model                    -- PCA model parameters
 PCA_Components               -- PCA loadings/components
 PCA_Metrics                  -- PCA quality metrics

Views (5 analytical views)

 v_Equip_Anomalies            -- Equipment anomaly summary
 v_Equip_DriftTS              -- Equipment drift timeline
 v_Equip_SensorTS             -- Equipment sensor time-series
 v_PCA_Loadings               -- PCA component interpretation
 v_PCA_Scree                  -- PCA variance explained plot data

Stored Procedures (19 write procedures)

CORE LIFECYCLE:
 usp_ACM_StartRun             -- Initialize pipeline run
 usp_ACM_FinalizeRun          -- Complete pipeline run

DATA WRITES:
 usp_Write_ScoresTS           -- Batch insert detector scores
 usp_Write_DriftTS            -- Batch insert drift signals
 usp_Write_AnomalyEvents      -- Write episode detections
 usp_Write_RegimeEpisodes     -- Write regime transitions
 usp_Write_AnomalyTopSpikes   -- Write culprit sensors
 usp_Write_XCorrTopPairs      -- Write correlation pairs
 usp_Write_FeatureImportance  -- Write drift culprits
 usp_Write_DriftSummary       -- Write drift summary
 usp_Write_CPD_Points         -- Write change points
 usp_Write_DataQualityTS      -- Write quality metrics
 usp_Write_ForecastResidualsTS -- Write forecast residuals
 usp_Write_ConfigLog          -- Write config changes
 usp_Write_RunStats           -- Write run statistics

PCA MODEL WRITES:
 usp_Write_PCA_Model          -- Persist PCA model
 usp_Write_PCA_Metrics        -- Write PCA quality metrics
 usp_Write_PCA_Loadings       -- Write PCA components
 usp_Write_PCA_ScoresTS       -- Write PCA scores

What's Been Implemented


Code Infrastructure (Ready for SQL Operation)

1. SQL Connection & Authentication

File: configs/sql_connection.ini (local, gitignored)

File: core/sql_client.py

2. Dual-Write Mode (Implemented)

File: core/output_manager.py (Lines 1-4547)

File: core/acm_main.py

3. SQL Performance Optimization

File: core/sql_performance.py

4. Model Persistence Architecture

File: core/model_persistence.py

5. Configuration Management

File: utils/sql_config.py

Database:

6. Equipment Discovery Integration

File: scripts/sql/25_equipment_discovery_procs.sql


Migration Phases (Revised for Complete SQL)


Migration Phases (Revised for Complete SQL)

Phase 0: Infrastructure Setup (COMPLETE)

Status: Done (November 13, 2025)

Evidence:

# Connection test passed
python scripts\sql\verify_acm_connection.py
# Output: CONNECTED server=B19cl3pc\B19CL3PCQLSERVER db=ACM

Phase 1: Dual-Write Validation (NEXT - START HERE)

Objective: Run pipeline in dual-write mode, validate SQL outputs against file outputs

Duration: 1-2 weeks
Risk: Low (file output preserved as fallback)
Data Source: CSV files (existing)
Output: Files + SQL (both)

Implementation Steps:

Step 1.1: Enable Dual-Write Mode

# Update config.yaml or set via command line
# Add to config.yaml:
output:
  dual_mode: true
  backend: file  # Primary is still file

# Or use environment variable
$env:ACM_DUAL_MODE = "true"

Step 1.2: Register Equipment

-- Add equipment to database
INSERT INTO Equipment (EquipCode, EquipName, Area, Unit, Status, CommissionDate)
VALUES 
  ('FD_FAN', 'Forced Draft Fan', 'Boiler', 'Unit 1', 1, '2024-01-01'),
  ('GAS_TURBINE', 'Gas Turbine GT-101', 'Power Generation', 'Unit 1', 1, '2024-01-01');

Step 1.3: Run Pipeline with Dual-Write

cd "c:\Users\bhadk\Documents\ACM V8 SQL\ACM"

# Run FD_FAN with dual-write enabled
python -m core.acm_main `
  --equip FD_FAN `
  --artifact-root artifacts `
  --enable-report `
  --mode file

# Expected behavior:
# - Reads CSV from data/
# - Writes CSV to artifacts/FD_FAN/run_*/
# - ALSO writes to SQL tables (ScoresTS, AnomalyEvents, etc.)
# - Console shows "[DUAL] SQL write succeeded" messages

Step 1.4: Validate SQL Data

-- Check that data was written to SQL
SELECT 'ScoresTS' as TableName, COUNT(*) as Rows FROM ScoresTS
UNION ALL SELECT 'AnomalyEvents', COUNT(*) FROM AnomalyEvents
UNION ALL SELECT 'DriftTS', COUNT(*) FROM DriftTS
UNION ALL SELECT 'PCA_Model', COUNT(*) FROM PCA_Model
UNION ALL SELECT 'Runs', COUNT(*) FROM Runs;

-- Should show thousands of rows in ScoresTS, dozens in other tables

Step 1.5: Compare File vs SQL Outputs ⏳ Create validation script: scripts/sql/validate_dual_write.py

# Pseudo-code
import pandas as pd
import pyodbc

# Load file output
file_scores = pd.read_csv("artifacts/FD_FAN/run_*/scores.csv")

# Load SQL output
conn = pyodbc.connect(...)
sql_scores = pd.read_sql("SELECT * FROM ScoresTS WHERE EquipID=1", conn)

# Compare
assert file_scores.shape[0] == sql_scores.shape[0]
assert file_scores['fused_z'].mean() == sql_scores['fused_z'].mean()
# etc...

Success Criteria Phase 1:

Deliverables Phase 1:


Phase 2: Model Persistence in SQL (PARALLEL WITH PHASE 1)

Objective: Store trained models in ModelRegistry table instead of .joblib files

Duration: 1 week
Risk: Low (models can still serialize to JSON)

Implementation Steps:

Step 2.1: Enhance ModelVersionManager

# In core/model_persistence.py
class ModelVersionManager:
    def save_to_sql(self, sql_client, equip_id, run_id, models_dict):
        """
        Persist models to SQL ModelRegistry table.
        
        Args:
            sql_client: Active SQL connection
            equip_id: Equipment ID
            run_id: Current run UUID
            models_dict: Dict with 'ar1', 'pca', 'iforest', 'gmm', 'regimes'
        """
        for model_type, model_obj in models_dict.items():
            params_json = self._serialize_model(model_obj)
            stats_json = self._extract_stats(model_obj)
            
            cur = sql_client.cursor()
            cur.execute("""
                INSERT INTO ModelRegistry 
                (ModelType, EquipID, Version, ParamsJSON, StatsJSON, RunID, EntryDateTime)
                VALUES (?, ?, ?, ?, ?, ?, GETUTCDATE())
            """, (model_type, equip_id, version, params_json, stats_json, run_id))
            sql_client.conn.commit()
    
    def load_from_sql(self, sql_client, equip_id, model_type, version=None):
        """Load latest (or specific version) model from SQL."""
        # Query ModelRegistry and deserialize JSON → model object
        pass

Step 2.2: Update acm_main.py

# After model training, save to SQL
if sql_client and cfg.get('output', {}).get('persist_models_sql', False):
    model_mgr.save_to_sql(sql_client, equip_id, run_id, {
        'ar1': ar1_models,
        'pca': pca_model,
        'iforest': iforest_model,
        'gmm': gmm_model,
        'regimes': regime_kmeans
    })

Step 2.3: Test Model Round-Trip

# scripts/sql/test_model_persistence.py
# 1. Train models
# 2. Save to SQL
# 3. Load from SQL
# 4. Compare predictions (should match exactly)

Success Criteria Phase 2:


Phase 3: SQL-Only Mode (PRODUCTION)

Objective: Disable file outputs, use SQL as primary storage

Duration: 2-3 weeks
Risk: Medium (requires historian integration)
Data Source: XStudio_Historian (live data)
Output: SQL only (no file artifacts)

Implementation Steps:

Step 3.1: Historian Integration

# Already implemented in core/historian.py
# Wire into acm_main.py data loading section

if SQL_MODE and not dual_mode:
    # Load from historian instead of CSV
    from core.historian import HistorianClient
    hist_client = HistorianClient.from_ini('xstudio_historian')
    
    df_train, df_score = hist_client.fetch_equipment_tags_for_acm(
        equip_code=equip_code,
        start_time=train_start,
        end_time=score_end
    )

Step 3.2: Disable File Writes

# In core/output_manager.py
if SQL_MODE and not dual_mode:
    # Skip all file writes
    Console.info("[SQL] File writes disabled in SQL-only mode")
    return

Step 3.3: Equipment Scheduler ⏳ Create scripts/run_all_equipment.py:

# Loop through all active equipment
# For each:
#   1. Query latest run timestamp from Runs table
#   2. Fetch new data from historian since last run
#   3. Execute pipeline
#   4. Write results to SQL
#   5. Update Runs table with completion status

Step 3.4: Production Deployment

Success Criteria Phase 3:


Current Action Plan (Start NOW)

Week 1: Dual-Write Validation

Day 1-2:

  1. Update config to enable dual_mode: true
  2. Register 2 test equipment (FD_FAN, GAS_TURBINE) in Equipment table
  3. Run 3 dual-write pipelines (FD_FAN, GAS_TURBINE, repeat)
  4. Verify SQL tables populated

Day 3-4: 5. Create validation script (validate_dual_write.py) 6. Compare file vs SQL outputs (row counts, statistics, keys) 7. Measure performance baseline 8. Fix any schema mismatches or write errors

Day 5: 9. Run 10 dual-write cycles on multiple equipment 10. Document validation results 11. Tune SQL batch writer performance

Week 2: Model Persistence

Day 6-8: 12. Implement save_to_sql() in ModelVersionManager 13. Implement load_from_sql() with version selection 14. Test model round-trip (save → load → predict)

Day 9-10: 15. Integrate model persistence into main pipeline 16. Run 5 training cycles, verify models in ModelRegistry 17. Test cold-start with SQL-loaded models

Week 3: Planning Phase 3

Day 11-15: 18. Review historian integration code (core/historian.py) 19. Plan equipment scheduler architecture 20. Design monitoring dashboard queries 21. Plan production deployment (scheduler, alerts, backups)


Key Design Decisions (Updated)

1. Dual-Write as Safety Net

2. Model Storage Strategy

3. Time-Series Storage

4. Performance Optimization

5. Equipment Master Data

6. Configuration Hierarchy

Priority (highest to lowest):

  1. SQL ConfigLog table (runtime overrides)
  2. SQL default config (seeded via 40_seed_config.sql)
  3. CSV config_table.csv (legacy fallback)
  4. YAML config.yaml (base defaults)

SQL Schema Design Principles

Normalized Structure

Time-Series Best Practices

Model Versioning


Risk Mitigation

Risk 1: SQL Write Performance

Risk 2: Schema Changes

Risk 3: Database Downtime

Risk 4: Data Volume Growth


Testing Strategy

Unit Tests

Integration Tests

Performance Tests


Success Metrics

Phase 1 (Dual-Write):

Phase 2 (Model Persistence):

Phase 3 (SQL-Only Production):


Rollback Plan

If SQL integration fails at any phase:

Phase 1 Rollback:

# Disable dual-write
$env:ACM_DUAL_MODE = "false"
# Or remove from config.yaml

Impact: Zero (file mode still works)

Phase 2 Rollback:

# In acm_main.py, disable SQL model persistence
persist_models_sql: false

Impact: Models saved to .joblib files instead

Phase 3 Rollback:

# Re-enable file mode, disable SQL-only
python -m core.acm_main --mode file --equip FD_FAN

Impact: Returns to CSV file processing


Next Steps (Immediate Actions)

1. Enable Dual-Write (TODAY)

# Edit config.yaml
output:
  dual_mode: true

# Register equipment
sqlcmd -S localhost\B19CL3PCQLSERVER -E -d ACM -Q "
INSERT INTO Equipment (EquipCode, EquipName, Area, Unit, Status)
VALUES ('FD_FAN', 'Forced Draft Fan', 'Boiler', 'Unit 1', 1)"

# Run pipeline
python -m core.acm_main --equip FD_FAN --artifact-root artifacts --enable-report

2. Validate Outputs (THIS WEEK)

-- Check SQL data
SELECT COUNT(*) FROM ScoresTS;
SELECT COUNT(*) FROM AnomalyEvents;
SELECT * FROM Runs ORDER BY StartTimeUTC DESC;

3. Implement Model Persistence (NEXT WEEK)

4. Plan Phase 3 (NEXT 2 WEEKS)


File Structure Summary

configs/
  sql_connection.ini          # Multi-database connections
  config.yaml                 # Legacy fallback (kept)

core/
  sql_client.py              # Enhanced for multi-DB
  historian.py               # NEW - Historian client
  acm_main.py                # Modified _load_config()
  data_io.py                 # SQL writers (already exist)

utils/
  sql_config.py              # NEW - SQL config reader/writer

scripts/sql/
  40_seed_config.sql         # NEW - Config seeding
  test_config_load.py        # NEW - Test script
  (other SQL scripts)         # Already exist from previous work

docs/sql/
  XHS_*.sql                  # Historian SPs (for reference)

Appendix: File Structure Summary

ACM/
├── configs/
│   ├── sql_connection.ini           Local SQL connection (Windows Auth)
│   ├── config.yaml                  Base config with dual_mode flag
│   └── config_table.csv             Legacy config fallback
│
├── core/
│   ├── acm_main.py                  Main pipeline (dual-write ready, line 658)
│   ├── sql_client.py                SQL connection manager (Windows Auth)
│   ├── output_manager.py            Dual-write coordinator (4547 lines)
│   ├── sql_performance.py           Batch writer + performance monitor
│   ├── model_persistence.py         Model versioning (ready for SQL)
│   └── historian.py                 Historian client (Phase 3)
│
├── utils/
│   ├── sql_config.py                SQL config reader/writer
│   └── logger.py                    Console logging
│
├── scripts/sql/
│   ├── 00_create_database.sql       Create ACM database
│   ├── 10_core_tables.sql           21 tables (Equipment, Runs, etc.)
│   ├── 15_config_tables.sql         Config + audit tables
│   ├── 20_stored_procs.sql          19 write procedures
│   ├── 25_equipment_discovery_procs.sql  DOW integration
│   ├── 30_views.sql                 5 analytical views
│   ├── 40_seed_config.sql           Config seeding script
│   ├── verify_acm_connection.py     Connection test script
│   ├── test_config_load.py          Config validation script
│   ├── validate_dual_write.py      ⏳ TO CREATE (Phase 1)
│   └── test_model_persistence.py   ⏳ TO CREATE (Phase 2)
│
├── data/                            CSV input files (Phase 1 data source)
│   ├── FD FAN TRAINING DATA.csv
│   └── Gas Turbine Training Data Set.csv
│
└── artifacts/                       File outputs (dual-write destination)
    └── {EQUIP}/
        ├── run_{timestamp}/         CSV/JSON/PNG outputs
        └── models/                  .joblib model cache

Summary & Recommendations

What works NOW:

What's preserved:

What's next (IMMEDIATE ACTIONS):

  1. Enable dual-write mode in config.yaml (set output.dual_mode: true)
  2. Register equipment in Equipment table (FD_FAN, GAS_TURBINE)
  3. Run 3-5 pipelines with dual-write enabled
  4. Verify SQL tables populated with correct data
  5. Create validation script to compare file vs SQL outputs
  6. Measure performance baseline (target <15s SQL writes)
  7. Implement model persistence to ModelRegistry table
  8. Plan Phase 3 historian integration and SQL-only mode

END OF REVISED SQL INTEGRATION PLAN